Algorithms and Framework for Energy Efficient Parallel Stream Computing on Many-Core Architectures
نویسنده
چکیده
The rise of many-core processor architectures in the market answers to a constantly growing need of processing power to solve more and more challenging problems such as the ones in computing for big data. Fast computation is more and more limited by the very high power required and the management of the considerable heat produced. Many programming models compete to take profit of many-core architectures to improve both execution speed and energy consumption, each with their advantages and drawbacks. The work described in this thesis is based on the dataflow computing approach and investigates the benefits of a carefully pipelined execution of streaming applications, focusing in particular on offand on-chip memory accesses. As case study, we implement classic and on-chip pipelined versions of mergesort for Intel SCC and Xeon. We see how the benefits of the on-chip pipelining technique are bounded by the underlying architecture, and we explore the problem of fine tuning streaming applications for many-core architectures to optimize for energy given a throughput budget. We propose a novel methodology to compute schedules optimized for energy efficiency given a fixed throughput target. We introduce Drake, derived from Schedeval, a tool that generates pipelined applications for Many-Core architectures and allows the performance testing in time or energy of their static schedule. We show that streaming applications based on Drake compete with specialized implementations and we use Schedeval to demonstrate performance differences between schedules that are otherwise considered as equivalent by a simple model. This work has been supported in parts by CUGS (the Graduate School in Computer Science, Sweden), Vetenskapsrådet, SeRC and EU FP7 EXCESS. Department of Computer and Information Science Linköping University SE-581 83 Linköping, Sweden
منابع مشابه
Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملRA-LPEL: a Resource-Aware Light-weight Parallel Execution Layer for reactive stream processing networks on the SCC many-core tiled architecture
In computing the available computing power has continuously fallen short of the demanded computing performance. As a consequence, performance improvement has been the main focus of processor design. However, due to the phenomenon called “Power Wall” it has become infeasible to build faster processors by just increasing the processor’s clock speed. One of the resulting trends in hardware design ...
متن کاملEnergy Aware Resource Management of Cloud Data Centers
Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Virtualization technology forms a key concept for new cloud computing architectures. The data centers are used to provide cloud services burdening a significant...
متن کاملIntelligent Trace and Evaluation for Parallel Programming Based on Architectural Details
As CMP became the main stream of processer design, parallel programming is a new challenge for programmer. The execution of the same program may perform much different based on various multi-core architectures. Even the same multi-core processor combined with different mapping strategies are still with distinct performance. How could programmers figure out if their programs, which based on spec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016